1. Introduction

1.1 Agent-based Models

Agent-based models simulate autonomous agents’ interactions in a constrained environment. These models simulate the simultaneous operations and interactions of multiple agents in an attempt to re-create and predict the appearance of complex phenomena (Wikipedia, 2017). One important application area of agent-based models is infectious disease modeling. One of the currently best available open-source method is SPEW (Synthetic Populations and Ecosystems of the World), which can be used to generate synthetic ecosystems and later to be usedfor simulate the spread of diseases among human populations.

However, human is not the only pathogen carrier and transmittor. In epidemiology, a disease vector is any agent (animal, or microorganism) that carries and transmits an infectious pathogen into another living organism. Some of the most notorious infectious diseases, such as Malaria, Lyme Disease, Zika and Dengue are spread by haematophagous anthropods such as Aedes Aegypti (yellow fever mosquito) and Ixodes Scapularis (deer tick) when they feed on blood. This vector modeling system provides a method to model the spread of such non-human vectors which can complement human-based synthetic ecosystems such as SPEW.

1.2 Proposed Approach Towards Synthetic Ecosystems of Disease Vectors

The most prominent challenge of modeling animal disease vectors is the lack of vector distribution data. Unlike human subjects whose presence and daily activities can be tracked through national surveys and registration, it is impractical to count the animal vector population or track them at an individual level. However, we do note that animal population depend much more heavily on environmental conditions and are thus more predictable as compared to humans. Therefore, by leveraging on the close relationship between animal population and environmental factors, this open-source R package provides a way to model species distribution and population sampling on a global scale.

Based on limited observational presense data of a species and global environmental factors, the module uses the Maxent model (Hijmans, Phillips, Leathwick, Elith) to project the species’ distribution over the entire globe, based on a maximum entropy approach. It provides the distribution predictions zoomed into any region, in any monthly period and performs population sampling. This module has made predictions on a global scale with spatial resolution down to \(2.5\ arc\ min^2\) (roughly \(20\ km^2\) at the equator) with some of the most common disease vectors such as rodents, ticks and mosquitos.

One main advantage of this module is its ability to model seasonal changes of species populations, which is vital for modeling common arthropod vectors which are highly sensitive to weather conditions. This module is also highly flexibile in making regional and seasonal predictions. Provided adequate data, user can simulate the distributions of any species in any region and any month of the year. The trade-off for this enhanced flexibility is potential inaccuracies from overgeneralization, as the model does not take into account other geographical and historical reasons which may affect a species population. Therefore, the model is expected to perform with best accuracy when local observation data is provided.

# Load library
library(dismo) # dependencies: raster, sp
library(plyr)

Data

The quality of the species distribution predictions inherently depends on the choice of input data. This section discusses the default input data sources used by this module.

Presence Data

The species presence data refer to the geographical locations (most commonly in longitudes and latitudes) of observed species. The species occurrence data is obtained from the Global Biodiversity Information Facility (GBIF), an international organization focussing on making scientific data on biodiversity available via the Internet. Data available through the GBIF portal are primarily distribution data on plants, animals, fungi, and microbes for the world.

The occurrence data can be drawn through the function get_species_data, which takes in a species name (a vector containing genus and species name) and returns a dataframe containing presence data (including longitudes and latitudes) for the specified species drawn from GBIF.

# function that takes in given species name (a vector containing genus and species name)
# and returns a dataframe of presence data for the specified species
get_species_data <- function(species_name) {
  a = 1
  # name for the dataframe/file containing the presence data for the specific species
  presence_name <- paste(c(species_name, "presence"), collapse="_")
  species_presence_filename <- paste(c(species_name, "presence.rds"), collapse="_")
  
  if (exists(presence_name)) {
    # presence data is already loaded in current session; return that data
    return (get(presence_name))
  } else if (file.exists(species_presence_filename)) {
    # presence data is saved locally; load data and return
    assign(presence_name, readRDS(species_presence_filename), envir=.GlobalEnv)
    return (get(presence_name))
  } else {
    # download presence data from GBIF and save locally
    species_presence <- gbif(genus=species_name[1], species=species_name[2])
    assign(presence_name, species_presence, envir=.GlobalEnv)
    # save data under specific species name
    saveRDS(get(presence_name), species_presence_filename)
    return (get(presence_name))
  }
}

Example: plot mosquito species presence data

# example: plot mosquito species presence data
mosquito_presence <- get_species_data(c("aedes", "aegypti"))
mosquito_location <- mosquito_presence[, c("lon", "lat")]

# install.packages("maptools")
library(maptools)
data(wrld_simpl)
plot(wrld_simpl)
points(mosquito_location, col='blue', cex=0.5)
title("Aedes Aegypti Observations")

Environmental Layers

The environmental factors are drawn from WorldClim, which provides a set of global climate layers (gridded climate data) with the spatial resolution down to \(1\ km^2\). These layers cover environment variables including monthly precipitation and temperature (min, max, mean).

The environmental layers are drawn using the function get_environment_layers, which takes in a resolution (10 arc min, 5 arc min, 2.5 arc min) and draws monthly environment data from WorldClim.

get_environment_layers <- function(resolution=10) {
  climates_datafile_name <- "climates.rds"
  if (exists("climates")) return (climates)
  else if (file.exists(climates_datafile_name)) {
    # climate data is saved locally; load data and return
    return (readRDS(climates_datafile_name))
  } else {
    tmean <- getData('worldclim', var='tmean', res=resolution)
    tmin <- getData('worldclim', var='tmin', res=resolution)
    tmax <- getData('worldclim', var='tmax', res=resolution)
    prec <- getData('worldclim', var='prec', res=resolution)
    climate_jan <- stack(tmean$tmean1, tmin$tmin1, tmax$tmax1, prec$prec1)
    climate_feb <- stack(tmean$tmean2, tmin$tmin2, tmax$tmax2, prec$prec2)
    climate_mar <- stack(tmean$tmean3, tmin$tmin3, tmax$tmax3, prec$prec3)
    climate_apr <- stack(tmean$tmean4, tmin$tmin4, tmax$tmax4, prec$prec4)
    climate_may <- stack(tmean$tmean5, tmin$tmin5, tmax$tmax5, prec$prec5)
    climate_jun <- stack(tmean$tmean6, tmin$tmin6, tmax$tmax6, prec$prec6)
    climate_jul <- stack(tmean$tmean7, tmin$tmin7, tmax$tmax7, prec$prec7)
    climate_aug <- stack(tmean$tmean8, tmin$tmin8, tmax$tmax8, prec$prec8)
    climate_sep <- stack(tmean$tmean9, tmin$tmin9, tmax$tmax9, prec$prec9)
    climate_oct <- stack(tmean$tmean10, tmin$tmin10, tmax$tmax10, prec$prec10)
    climate_nov <- stack(tmean$tmean11, tmin$tmin11, tmax$tmax11, prec$prec11)
    climate_dec <- stack(tmean$tmean12, tmin$tmin12, tmax$tmax12, prec$prec12)

    # save 12-month climates globally
    climates <- c(climate_jan, climate_feb, climate_mar, climate_apr, 
                    climate_may, climate_jun, climate_jul, climate_aug, 
                    climate_sep, climate_oct, climate_nov, climate_dec)
    saveRDS(climates, climates_datafile_name)
    return (climates)
  }
}

Example: plot global mean temperatures throughout the year

Note that the temperature units are 10*Degree Celsius.

# plot mean temperatures throughout the year
par(mfrow=c(1, 4))
environment_data <- get_environment_layers()
tmaxs <- stack()
max_range <- c(0,0)
for (i in 1:12) {
  tmaxs <- stack(tmaxs, environment_data[[i]][[3]])
  max_range <- range(max_range, range(getValues(environment_data[[i]][[3]]), na.rm=T))
}
plot(tmaxs, breaks=seq(from=min(max_range), to=max(max_range), length.out=100), 
     col=rev(heat.colors(99)), legend=F)
plot(tmaxs, legend.only=T, 
     breaks=seq(from=min(max_range), to=max(max_range), length.out=100),
     col=rev(heat.colors(99)),
     axis.args=list(at=seq(max_range[1], max_range[2], length.out=5),
     labels=round(seq(max_range[1], max_range[2], length.out=5)/10,3),
     cex.axis=0.6))

Example: plot environment conditions (mean temperature, min temperature, max temperature and precipitation) for June

Note that the temperature units are 10*Degree Celsius.

# plot environment conditions (mean temperature, min temperature, max temperature and precipitation) for June
max_range_june <- c(0,0)
temperatures <- stack()
environment_june <- environment_data[[6]]
for (i in 1:3) {
  temperatures <- stack(temperatures, environment_june[[i]])
  max_range_june <- range(max_range_june, range(getValues(environment_june[[i]]), na.rm=T))
}

max_range_june <- max_range
# color scale: either using max_range_june or max_range (over the entire year, consistent with the previous graph)
plot(temperatures, 
     breaks=seq(from=min(max_range_june),to=max(max_range_june),length.out=100), 
     col=rev(heat.colors(99)), legend=F)
plot(temperatures, legend.only=T, 
     breaks=seq(from=min(max_range_june), to=max(max_range_june), length.out=100),
     col=rev(heat.colors(99)),
     axis.args=list(at=seq(max_range_june[1], max_range_june[2], length.out=5),
     labels=round(seq(max_range_june[1], max_range_june[2], length.out=5)/10,3),
     cex.axis=0.6))

library(RColorBrewer)
plot(environment_june[[4]], 
     col=colorRampPalette(c('#e4f4fc','#105a7c'), bias=50)(99), main="Precipitation for June (mm)")

Geographical Boundary

Given a specific region of interested, this module gets the geographical boundary of the correpsonding region from Global Administrative Areas (GADM), which is a spatial database of the adminstrative boundaries. Administrative areas in this database are countries and lower level subdivisions such as provinces and counties.

The geographical boundary is obtained through the function get_boundary, which takes in a vector of region name (from national level down to county level, i.e. c(“USA”), c(“USA”,“Pennsylvania”), c(“USA”,“Pennsylvania”, “Allegheny”)), and returns a spatial polygon outlining the region.

get_boundary <- function(region) {
  if (length(region) == 1) {
    target_region <- getData("GADM", country=region, level=0)
  }
  else if (length(region) == 2) {
    nation <- getData("GADM", country=region[1], level=1)
    state <- region[2]
    target_region <- nation[match(toupper(state),toupper(nation$NAME_1)),]
  }
  else if (length(region) == 3) {
    nation <- getData("GADM", country=region[1], level=2)
    state <- region[3]
    target_region <- nation[match(toupper(state),toupper(nation$NAME_2)),]
  }
  return (target_region)
}

Example: plot the region of Pennsylvania

# Plotting the region of Pennsylvania
PA <- get_boundary(c("USA", "Pennsylvania"))
plot(PA)

Example: plot the environment conditions for Pennsylvania

# Plotting the environment conditions for Pennsylvania
PA_environment_6 <- mask(crop(environment_data[[6]], PA), PA)

max_range_june <- c(0,0)
temperatures <- stack()
environment_june <- PA_environment_6
for (i in 1:3) {
  temperatures <- stack(temperatures, environment_june[[i]])
  max_range_june <- range(max_range_june, range(getValues(environment_june[[i]]), na.rm=T))
}

# color scale: either using max_range_june or max_range (over the entire year, consistent with the previous graph)
plot(temperatures, 
     breaks=seq(from=min(max_range_june),to=max(max_range_june),length.out=100), 
     col=rev(heat.colors(99)), legend=F)
plot(temperatures, legend.only=T, 
     breaks=seq(from=min(max_range_june), to=max(max_range_june), length.out=100),
     col=rev(heat.colors(99)),
     axis.args=list(at=seq(max_range_june[1], max_range_june[2], length.out=5),
     labels=round(seq(max_range_june[1], max_range_june[2], length.out=5)/10,3),
     cex.axis=0.6))

library(RColorBrewer)
plot(environment_june[[4]], 
     col=colorRampPalette(c('#e4f4fc','#105a7c'), bias=50)(99), main="Precipitation for June (mm)")

```

Methods

Maxent modeling and prediction

The Maxent software is based on the maximum-entropy approach for species habitat modeling. The model for a species is determined from a set of environmental or climate layers for a set of grid cells in a landscape, together with a set of sample locations where the species has been observed. The computed model is a probability distribution (indicating the level of suitability for species’ living) over all the grid cells.

The distribution chosen is the one that has maximum entropy subject to some constraints: it must have the same expectation for each feature (derived from the environmental layers) as the average over sample locations.

Two probability distributions can be obtained from this model (Elith, Jane, et al. 2011):

Raw Probability:

\[ P_{raw} = \frac{f_1(z)}{f(z)}\] \(f_1(z)\): conditional density at presence sites.

\(f(z)\): marginal density across the study area.

The raw output of the site is the probability that a randomly chosen presence comes from this site, so the sum of values at all sites is 1 for training data. The raw output values are thus typically very small. Ecologically, it can be interpreted as the proportion of the potential distribution that each site represents. This module uses the raw probabilities to sample population for a given region, since the raw output indicates the relative likelihood of finding a species in different cells contained in the region.

Logistic Probability:

\[P_{logit} = logistic (\frac{f_1(z)}{f(z)}) = logistic (P_{raw})\]

The log output is simply a logistic transformation of the raw output, which indicates a relative probability of occurrence. In fact, it could be a real probability of occurrence if the prevalence of the species in the training data was exactly equal to the prevalence of the species in reality across all the study (but this is hard to verify as the true prevalence is generally unknown). This module uses the logistic probabilities to estimate proportionate sample size for each month, since it reflects the actual population size throughout the year.

get_maxent_predict <- function(month, species_presence, climate, species_name) {
  # java setup for maxent modeling
  jar <- paste(system.file(package="dismo"), "/java/maxent.jar", sep='')
  # set prediction output filename ## need to include species (and resolution?)
  predict_filename_logis <- paste(paste(species_name, collapse="_"), "prediction_logis", month, ".asc", sep="_")
  predict_filename_raw <- paste(paste(species_name, collapse="_"), "prediction_raw", month, ".asc", sep="_")
  
  # prepare maxent_predict_raw
  if (file.exists(predict_filename_raw) && file.exists(predict_filename_logis)) {
    maxent_predict_raw <- raster(predict_filename_raw)
    maxent_predict_logis <- raster(predict_filename_logis)
  } else {
    model_month <- maxent(climate, species_presence)
    maxent_predict_raw <- predict(model_month, climate, filename=predict_filename_raw, 
                                  overwrite=T, args=c("outputformat=raw"))
    maxent_predict_logis <- predict(model_month, climate, filename=predict_filename_logis, overwrite=T)
  }
  
  projection(maxent_predict_raw) <- "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
  projection(maxent_predict_logis) <- "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
  return (list(raw=maxent_predict_raw, logis=maxent_predict_logis))
}

Example: predict mosquito distribution in June

# Example: predict mosquito distribution in June
month <- 6
mosquito_month <- mosquito_presence[mosquito_presence$month==month, c("lon", "lat")]
mosquito_month <- subset(mosquito_month, !is.na(lon) & !is.na(lat))
mosquito_predictions_6 <- get_maxent_predict(month, mosquito_month, environment_data[[6]], c("aedes", "aegypti"))
par(mfrow=c(1,2))
plot(mosquito_predictions_6$raw)
title("Aedes Aegypti Distribution Raw Predictions")
plot(mosquito_predictions_6$logis)
title("Aedes Aegypti Distribution Logistic Predictions")

Species Population Sampling

Given a specific region and month, population_sampling function zooms in on the probability distributions in the region, calculates its average logistic probabilities and calculate the corresponding sample size for the given month using the formula:

\[N_{given\ month} = \frac{Mean(logistic\ probabilities\ in\ given\ month)}{Mean(logistic\ probabilities\ in\ reference\ month)}* N_{reference\ month}\] Then \(N_{given\ month}\) samples are drawn from the grid cells in the region according to normalized raw probabilities in the region. This is done by first drawing \(N_{given\ month}\) region’s grids cells with replacement and then add random uniformly distributed within-cell deviations from the cell center.

# helper: population sampling given distribution
population_sampling <- function(maxent_predict_raw, maxent_predict_logis,
                                region=c("ITA"),
                                species, N, month_logis_base, month, 
                                graph_logis=F, graph_raw=F) {
  # get region's shape (sptialpolygons) and extract corresponding grids from the prediction raster
  target_region <- get_boundary(region)
  region_pred_raw <- mask(crop(maxent_predict_raw, target_region), target_region)
  
  region_pred_logis <- mask(crop(maxent_predict_logis, target_region), target_region)
  if (graph_logis) plot(region_pred_logis, main=paste("Logistic Probabilities -", paste(region, collapse=",")))
  region_pred_logis_base <- mask(crop(month_logis_base, target_region), target_region)
  N <- mean(getValues(region_pred_logis), na.rm=T) / mean(getValues(region_pred_logis_base), na.rm=T) * N

  # use raw probabilities from maxent model
  if (graph_raw) plot(region_pred_raw, main=paste("Raw Probabilities -", paste(region, collapse=",")))
  probrast<-region_pred_raw

  # normalize the region probability raster by dividing 
  # by the sum of all probabilities within the region:
  probrast<-probrast/sum(getValues(probrast), na.rm=T)
  
  # a function to sample with replacement N points on a raster, with 
  # inclusion probabilities defined by the raster values
  probsel<-function(probrast, N, x_radius, y_radius){
    # extract values (probabilities) from raster
    x <- getValues(probrast)
    # set NA cells in raster to zero
    x[is.na(x)] <- 0
    # sample with replacement N cells
    samp <- sample(nrow(probrast)*ncol(probrast), replace=T, size=N, prob=x)
    # count the number of times each cell is selected
    freq_table <- count(samp)
    samprast <- probrast
    # set value of sampled cell to its count (selected times)
    samprast[freq_table$x] <- freq_table$freq 
    
    # a function to extract centers of selected grid cells from raster
    get_points <- function(i) (rasterToPoints(samprast, fun=function(x){x>=i}))
    
    # extract and combine cell centers 
    # according to the number of times each gets selected
    points <- do.call(rbind, lapply(1:max(freq_table$freq), get_points))
    
    # add within-cell variation to the coordinates selected
    x_vary <- runif(N, min = -x_radius, max = x_radius)
    y_vary <- runif(N, min = -y_radius, max = y_radius)
    points[,"x"] <- points[,"x"] + x_vary
    points[,"y"] <- points[,"y"] + y_vary
    
    # convert to SpatialPoints
    points<-SpatialPoints(points)
    return(points)
  }

  x_radius <- res(region_pred_raw)[1]/2
  y_radius <- res(region_pred_raw)[2]/2

  start_time <- Sys.time()
  sample_points<-probsel(probrast, N, x_radius, y_radius)
  end_time <- Sys.time()
  return (list(month_samples=sample_points, month_region_log=region_pred_logis, month_N=N))
}

Example: draw a sample of 2000 mosquitos from Pennsylvania

sampling_results <- population_sampling(mosquito_predictions_6$raw,
                                        mosquito_predictions_6$logis,
                                        region=c("USA", "Pennsylvania"),
                                        species=c("aedes", "aegypti"), 
                                        N=2000,
                                        month_logis_base=mosquito_predictions_6$logis, 
                                        month=6, graph_logis=T, graph_raw=T) 

plot(sampling_results$month_region_log)
points(sampling_results$month_samples, cex=0.5, pch=16, col="blue")
title("Logistic Probabilities and Aedes Aegypti Population Sampling for PA")

Simulation

run_simulation takes in a species name, a month, the number of samples to draw for that month, a region and optionally the environment layer resolution. It performs the simulation for the species’ population sampling in the region across different months.

run_simulation <- function(species_name, month, N, region, resolution=10) {
  species_presence <- get_species_data(species_name)
  species_month <- species_presence[species_presence$month==month,c("lon", "lat")]
  species_month <- subset(species_month, !is.na(lon) & !is.na(lat))
  climates <- get_environment_layers(resolution)
    
  # get environment data by month
  climate_month <- climates[[month]]

  # get maxent prediction for base month
  month_predicted_distributions <- get_maxent_predict(month, species_month, climate_month, species_name)
  month_logis_base <- month_predicted_distributions$logis
 
  monthly_logistic_distributions <- stack()
  monthly_samples <- c()
  monthly_N <- c()
  probability_ranges <- list()
  
  # get predictions and corresponding sampling for other month
  for (i in c(1:length(months))) {
    month <- months[i]
    # get species data by month
    species_month <- species_presence[species_presence$month==month,c("lon", "lat")]
    species_month <- subset(species_month, !is.na(lon) & !is.na(lat))
    
    # get environment data by month
    climate_month <- climates[[month]]

    # get maxent prediction
    month_predicted_distributions <- get_maxent_predict(month, species_month, climate_month, species_name)
    month_raw <- month_predicted_distributions$raw
    month_logis <- month_predicted_distributions$logis

    # sample from predicted distribution
    result <- population_sampling(month_raw, month_logis, region, species_name, N, month_logis_base, month)
    month_samples <- result$month_samples
    month_region_log <- result$month_region_log
    monthly_N <- c(monthly_N, result$month_N)
    monthly_logistic_distributions <- stack(monthly_logistic_distributions, month_region_log)
    monthly_samples <- c(monthly_samples, month_samples)
    probability_ranges[[i]] = range(getValues(month_logis), na.rm=T)
  }
  return  (list(log=monthly_logistic_distributions, sample=monthly_samples, 
                range=probability_ranges, N=monthly_N, species_name = species_name,
                region=region))

}

Results

Simulation results for house mice (Mus Musculus) and a type of deer ticks (Ixodes Ricinus Linnaeus) in Pennsylvania, drawing 2000 samples in June.

plot_one_result <- function(result, max_range) {
  par(mfrow=c(4,3))
  for (i in 1:nlayers(result$log)) {
    layer <- result$log[[i]]
    plot(layer, breaks=seq(from=min(max_range), to=max(max_range), length.out=100), 
         col=rev(terrain.colors(99)),
         legend=F,
         main=paste(paste(result$species_name, collapse=" "), as.integer(result$N[i]),
                              "samples -",
                              paste(result$region, collapse=", "), months[i]))
    plot(layer, legend.only=T, 
         breaks=seq(from=min(max_range), to=max(max_range), length.out=100), 
         col=rev(terrain.colors(99)),
          axis.args=list(at=seq(max_range[1], max_range[2], length.out=5),
                      labels=round(seq(max_range[1], max_range[2], length.out=5),3),
                      cex.axis=0.6),
         legend.args=list(text='Logistic Probability', side=4, font=2, line=2.5, cex=0.8))
    plot(result$sample[[i]], add=T, pch=16, cex=0.5, col="blue")
  }
}

plot_results <- function(results) {
  max_range <- c(0,0)
  for (i in 1:length(results)) {
    result = results[[i]]
    max_range <- range(max_range, range(unlist(result$range)))
  }
  for (i in 1:length(results)) {
    result = results[[i]]
    plot_one_result(result, max_range)
  }
}


months <- c(1:12)
month <- 6
N <- 2000
resolution <- 2.5
region1 <- c("USA", "Pennsylvania")
region2 <- c("ITA")
species_name1 <- c("aedes", "aegypti")
species_name2 <- c("aedes", "triseriatus")
species_name3 <- c("lutzomyia", "shannoni") 
species_name4 <- c("Ixodes", "ricinus", "Linnaeus")
species_name5 <- c("mus", "musculus")


result1 <- run_simulation(species_name4, month, N, region1, resolution)
result2 <- run_simulation(species_name5, month, N, region1, resolution)

plot_results(list(result1, result2))

Conclusion

The disease vector modeling system (DVM) provides a method to model the population distribution of common disease vectors such as anthropods, which can then be incorporated into human-based synthetic ecosystems such as SPEW to simulate the spread of infectious diseases across species. Through the Maxent model, the system uses species observation and environment data to predict the species’ distribution over a global environment. It provides distribution predictions and population sampling given any species, any region and any month of the year. It serves as a convenient tool to track seasonal changes in disease vector populations for more accurate simulations.

Moving forward, I plan to further enhance the flexibility of the DVM system by allowing users to supply input data instead of using the default data sources for more updated and relevant results. I will improve the system’s compatability with various input formats for region name and species name. The procedure can be expedited by replacing global predictions with regional predictions, which can save substantial running time especially if users are only interested in a limited number of regions. I will incorporate this prediction approach and provide it as a speed-up option for users. Furthermore, I will search for real population data of common disease vectors. The actual population size will serve as a good reference for general users when deciding on the sample size during population sampling.

References

Agent-based model - Wikipedia. (n.d.). Retrieved April 3, 2017, from https://en.wikipedia.org/wiki/Agent-based_model

Elith, J., Phillips, S. J., Hastie, T., Dudík, M., Chee, Y. E., & Yates, C. J. (2011). A statistical explanation of MaxEnt for ecologists. Diversity and distributions, 17(1), 43-57.

Gallagher, S., Richardson, L., Ventura, S. L., & Eddy, W. F. (2017). SPEW: Synthetic Populations and Ecosystems of the World. arXiv preprint arXiv:1701.02383.